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Abstract 

We define and study the problem of predicting the solution to a linear program (LP) 
given only partial information about its objective and constraints. This generalizes 
the problem of learning to predict the purchasing behavior of a rational agent who 
has an unknown objective function, that has been studied under the name “Learning 
from Revealed Preferences". We give mistake bound learning algorithms in two 
settings; in the first, the objective of the LP is known to the learner but there is an 
arbitrary, fixed set of constraints which are unknown. Each example is defined by 
an additional known constraint and the goal of the learner is to predict the optimal 
solution of the LP given the union of the known and unknown constraints. This 
models the problem of predicting the behavior of a rational agent whose goals 
are known, but whose resources are unknown. In the second setting, the objective 
of the LP is unknown, and changing in a controlled way. The constraints of the 
LP may also change every day, but are known. An example is given by a set of 
constraints and partial information about the objective, and the task of the learner 
is again to predict the optimal solution of the partially known LP. 


1 Introduction 

We initiate the systematic study of a general class of multi-dimensional prediction problems, where 
the learner wishes to predict the solution to an unknown linear program (LP), given some partial 
information about either the set of constraints or the objective. In the special case in which there is a 
single known constraint that is changing and the objective that is unknown and fixed, this problem 
has been studied under the name learning from revealed preferences ifTl l2l l3l fTbl and captures the 
following scenario: a buyer, with an unknown linear utility function over d goods u : ^ M. 

defined as u(x) = c • x faces a purchasing decision every day. On day t, she observes a set of prices 
P* e and buys the bundle of goods that maximizes her unknown utility, subject to a budget b: 

x^*^ = argmax c • x such that p* • x < & 

X 

In this problem, the goal of the learner is to predict the bundle that the buyer will buy, given the 
prices that she faces. Each example at day t is specified by the vector p* e (which fixes the 
constraint), and the goal is to accurately predict the purchased bundle x^*) G [0, that is the result 
of optimizing the unknown linear objective. 

It is also natural to consider the class of problems in which the goal is to predict the outcome to a LP 
broadly e.g. suppose the objective c • x is known but there is an unknown set of constraints Ax < b. 
An instance is again specified by a changing known constraint (p*, &*) and the goal is to predict; 

x^*^ = argmax c • x such that Ax < b and p* • x < b*. (1) 


30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. 




This models the problem of predicting the behavior of an agent whose goals are known, but whose 
resource constraints are unknown. 

Another natural generalization is the problem in which the objective is unknown, and may vary in a 
specified way across examples, and in which there may also be multiple arbitrary known constraints 
which vary across examples. Specifically, suppose that there are n distinct, unknown linear objective 
functions v^,..., v". An instance on day t is specified by a subset of the unknown objective 
functions. S'* C [n] := {1, ... ,n} and a convex feasible region V^, and the goal is to predict; 

= argmax v* • x such that x G V*. (2) 

When the changing feasible regions 7^* correspond simply to varying prices as in the revealed 
preferences problem, this models a setting in which at different times, purchasing decisions are made 
by different members of an organization, with heterogeneous preferences — but are still bound by 
an organization-wide budget. The learner’s problem is, given the subset of decision makers and the 
prices at day t, to predict which bundle they will purchase. This generalizes some of the preference 
learning problems recently studied by Blum et al 0. Of course, in this generality, we may also 
consider a richer set of changing constraints which represent things beyond prices and budgets. 

In all of the settings we study, the problem can be viewed as the task of predicting the behavior of a 
rational decision maker, who always chooses the action that maximizes her objective function subject 
to a set of constraints. Some part of her optimization problem is unknown, and the goal is to learn, 
through observing her behavior, that unknown part of her optimization problem sufficiently so that 
we may reliably predict her future actions. 

1.1 Our Results 

We study both variants of the problem (see below) in the strong mistake bound model of learning 
Ha. In this model, the learner encounters an arbitrary adversarially chosen sequence of examples 
online and must make a prediction for the optimal solution in each example before seeing future 
examples. Whenever the learner’s prediction is incorrect, the learner encounters a mistake, and 
the goal is to prove an upper bound on the number of mistakes the learner can make, in the worst 
case over the sequence of examples. Mistake bound learnability is stronger than (and implies) PAC 
learnability US). 

Known Objective and Unknown Constraints We first study this problem under the assumption 
that there is a uniform upper bound on the number of bits of precision used to specify the constraint 
defining each example. In this case, we show that there is a learning algorithm with both running time 
and mistake bound linear in the number of edges of the polytope formed by the unknown constraint 
matrix Ax < b. We note that this is always polynomial in the dimension d when the number of 
unknown constraints is at most d + 0(1). (In Appendix [a| we show that by allowing the learner 
to run in time exponential in d, we can give a mistake bound that is always linear in the dimension 
and the number of rows of A, but we leave as an open question whether or not this mistake bound 
can be achieved by an efficient algorithm.) We then show that our bounded precision assumption 
is necessary — i.e. we show that when the precision to which constraints are specified need not be 
uniformly upper bounded, then no algorithm for this problem in dimension d > 3 can have a finite 
mistake bound. 

This lower bound motivates us to study a PAC style variant of the problem, where the examples are 
not chosen in an adversarial manner, but instead are drawn independently at random from an arbitrary 
unknown distribution. In this setting, we show that even if the constraints can be specified to arbitrary 
(even infinite) precision, there is a learner that requires sample complexity only linear in the number 
of edges of the unknown constraint polytope. This learner can be implemented efficiently when the 
constraints are specified with finite precision. 

Known Constraints and Unknown Objective For the variant of the problem in which the objec¬ 
tive is unknown and changing and the constraints are known but changing, we give an algorithm 
that has a mistake bound and running time polynomial in the dimension d. Our algorithm uses the 
Ellipsoid algorithm to learn the coefficients of the unknown objective by implementing a separation 
oracle that generates separating hyperplanes given examples on which our algorithm made a mistake. 
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We leave the study of either of our problems under natural relaxations (e.g. under a less demanding 
loss function) and whether it is possible to substantially improve our results in these relaxations as an 
interesting open problem. 

1.2 Related Work 

Beigman and Vohra |0 were the first to study revealed preference problems (RPP) as a learning 
problems and to relate them to multi-dimensional classification. They derived sample complexity 
bounds for such problems by computing the fat shattering dimension of the class of target utility 
functions, and showed that the set of Lipschitz-continuous valuation functions had finite fat-shattering 
dimension. Zadimoghaddam and Roth gave efficient algorithms with polynomial sample 
complexity for PAC learning of the RPP over the class of linear (and piecewise linear) utility 
functions. Balcan et al. ||2) showed a connection between RPP and the structured prediction problem 
of learning d-dimensional linear classes Enani, and use an efficient variant of the compression 
techniques given by Daniely and Shalev-Shwartz a to give efficient PAC algorithms with optimal 
sample complexity for various classes of economically meaningful utility functions. Amin et al. 11 
study the RPP for linear valuation functions in the mistake bound model, and in the query model 
in which the learner gets to set prices and wishes to maximize profit. Roth et al. llT4ll also study 
the query model of learning and give results for strongly concave objective functions, leveraging an 
algorithm of Belloni et al. 01 for bandit convex optimization with adversarial noise. 

All of the works above focus on the setting of predicting the optimizer of a fixed unknown objective 
function, together with a single known, changing constraint representing prices. This is the primary 
point of departure for our work — we give algorithms for the more general settings of predicting the 
optimizer of a LP when there may be many unknown constraints, or when the unknown objective 
function is changing. Finally, the literature on preference learning (see e.g. Qol) has similar goals, 
but is technically quite distinct; the canonical problem in preference learning is to learn a ranking on 
distinct elements. In contrast, the problem we consider here is to predict the outcome of a continuous 
optimization problem as a function of varying constraints. 

2 Model and Preliminaries 

We first formally define the geometric notions used throughout this paper. A hyperplane and a 
halfspace in are the set of points satisfying the linear equation aiXi -f ... adXd = b and the 
linear inequality a^xi -f ... -f adXd < b for a set of a^s respectively, assuming that not all afs are 
simultaneously zero. A set of hyperplanes are linearly independent if the normal vectors to the 
hyperplanes are linearly independent. A polytope (denoted by 7^ C K^*) is the bounded intersection 
of finitely many halfspaces, written as 7^ = {x | Ax < b}. An edge-space e of a polytope V is a one 
dimensional subspace that is the intersection of d — 1 linearly independent hyperplanes of V, and an 
edge is the intersection between an edge-space e and the polytope V.We denote the set of edges of 
polytope V by E-p. A vertex of 7^ is a point where d linearly independent hyperplanes of V intersect. 
Equivalently, V can be written as the convex hull of its vertices V denoted by Conv(y). Finally, we 
define a set of points to be collinear if there exists a line that contains all the points in the set. 

We study an online prediction problem with the goal of predicting the optimal solution of a changing 
LP whose parameters are only partially known. Formally, in each day t = 1, 2,... an adversary 
chooses a LP specified by a polytope 7^^*^ (a set of linear inequalities) and coefficients G 
of the linear objective function. The learner’s goal is to predict the solution x*^*) where x*^*^ = 
argmaXxgp(t) • x. After making the prediction the learner observes the optimal x*^*^ and 
learns whether she has made a mistake (x^*) f x^*)). The mistake bound is defined as follows. 

Definition 1. Given a LP with feasible poly tope V and objective function c, let denote the 
parameters of the LP that are revealed to the learner on day t. A learning algorithm A takes as 
input the sequence the known parameters of an adaptively chosen sequence 

of LPs and outputs a sequence of predictions We say that A has mistake bound M if 

max{(-p(t) c(t))}j f x^‘^] } < M, where x*^‘^ = argmax^g-pct) ■ x on day t. 

We consider two different instances of the problem described above. First, in Sectionj^ we study 
the problem given in 0 in which = c is fixed and known to the learner but the polytope = 
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V n consists of an unknown fixed polytope V and a new constraint = {x | • x < } 

which is revealed to the learner on day t i.e. = (A/"*-*^, c). We refer to this as the Known Objective 
problem. Then, in Section W we study the problem in which the polytope 7^^*) is changing and known 
but the objective function unknown and changing as in (|^ where the set is 

known i.e. We refer to this as the Known Constraints problem. 

In order for our prediction problem to be well defined, we make Assumption[T]about the observed 
solution x^*) in each day. Assumption[^guarantees that each solution is on a vertex of 

Assumption 1. The optimal solution to the LP: max 3 (.gp(t) • x is unique for all t. 


3 The Known Objective Problem 


In this section, we focus on the Known Objective Problem where the coefficients of the objective 
function c are fixed and known to the learner but the feasible region on day t is unknown and 
changing. In particular, is the intersection of a fixed and unknown polytope 7^ = {x | Ax < 
b, A C and a known halfspace = {x | • x < i.e. = V D 

Throughout this section we make the following assumptions. First, we assume w.l.o.g. (up to scaling) 
that the points in V have £oo-norm bounded by 1. 

Assumption 2. The unknown polytope V lies inside the unit iao -ball i.e. 7^ C {x | ||x||oo A 1}- 

We also assume that the coordinates of the vertices in V can be written with finite precision (this is 
implied if the halfspaces defining V can be described with finite precision). 

Assumption 3. The coordinates of each vertex ofV can be written with N bits of precision. 

We show in Section [33] that Assumption j^is necessary — without any upper bound on precision, 
there is no algorithm with a finite mistake bound. Next, we make some non-degeneracy assumptions 
on poly topes V and respectively. We require these assumptions to hold on each day. 

Assumption 4. Any subset of d — 1 rows of A have rank d — 1 where A is the constraint matrix in 
75 = {x I Ax < b}. 

Assumption 5. Each vertex of is the intersection of exactly d-hyperplanes ofV^*\ 


The rest of this section is organized as follows. We present LecurnEdge for the Known Objective 
Problem and analyze its mistake bound in Sections 3.1 and 3.2 respectively. Then in Section [33] 
we prove the necessity of Assumption]^ to get a finite mistake bound. Finally in Section ]3A] we 
present the LearnHull in a PAC style setting where the new constraint each day is drawn i.i.d. from 
an unknown distribution, rather than selected adversarially. 


3.1 LearnEdge Algorithm 

In this section we introduce LearnEdge and show in Theoremthat the number of mistakes of 
LearnEdge depends linearly on the number of edges E-p and the precision parameter N and only 
logarithmically on the dimension d. We defer all the proofs of this section to Appendix ]B] 

Theorem 1. The number of mistakes and per day running time of LearnEdge in the Known Objective 
Problem are 0{\Ep\N\og{d)) and poly(m, d, \Ep\) respectively when A C 


At a high level, LearnEdge maintains a set of prediction information about the predic tion h istory 


up to day t, and makes prediction in each day based on and a set of prediction rules (P.l 


P.4 


After making a mistake, LecurnEdge updates the information with a set of update rules ( U.l - U.4 


The framework of LearnEdge is presented in Algorithm]!] We will now present the details of eac 
component. 


'Lemma 6.2.4 from Grotschel et al. GD states that if each constraint in T’ C has encoding length at most 
N then each vertex of V has encoding length at most A. Typically the finite precision assumption is made 
on the constraints of the LP. However, since this assumption implies that the vertices can be described with finite 
precision, for simplicity, we make our assumption directly on the vertices. 
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Algorithm 1 Learning in Known Objective Problem (LearnEdge) 


procedure LearnEdge({A/’^‘\ 
Initialize to be empty. 

for t = 1,2, • • • do 


\> Against adaptive adversary 
[> Initialize 


Predict according to one of P.l - P.4 based on and . [> Predict 

if x^*) ^ x(‘) then 

X(‘+i) ^ U |x(*)| . 

Update with U.l - U.4 based on x^*). > Update 

end procedure 


Prediction Information It is natural to ask “What information is useful for prediction?" Lemma|^ 
establishes the importance of the set of edges E-p by showing that all the observed solutions will be 
on an element of Ep. 

Lemma 2. On any day t, the observed solution x*^*^ lies on an edge in Ep. 

In the proof of Lemma|^we also show that when x*^*) does not bind the new constraint then 
x^*) is the solution for the underlying LP; argmaxxgp c • x. 

Corollary 1. //x^*) G {x | p(*)x < then x^*^ = X* = argmax^^gp c • x. 

We then show how an edge-space e of 7^ can be recovered after seeing 3 collinear observed solutions. 

Lemma 3. Let x, be 3 distinct collinear points on edges ofV. Then they are all on the same 
edge ofV and the 1-dimensional subspace containing them is an edge-space ofV. 

Given the relation between observed solutions and edges, the information is stored as follows: 

Figure 1: Regions on an edge-space e: feasible 
region E^ (blue), questionable intervals Qg and 
Ql (green) with their mid-points Mg and Ml and 
infeasible regions and Yg^ (dashed). 



1.1 (Observed Solutions) LearnEdge keeps track of the set of observed solutions that were 
predicted incorrectly so far = {x^"") : t < t x^’’) ^ and also the solution for 
the underlying unknown polytope x* = argmaxxep c • x if it is observed. 

1.2 (Edges) LeeirnEdge keeps track of the set of edge-spaces E^*'> given by any 3 collinear 

points in For each e G E^*\ it also maintains the regions on e that are certainly 

feasible or infeasible. The remaining parts of e called the questionable region is where 
LecurnEdge cannot classify as infeasible or feasible with certainty (see Figure[T]i. Formally, 

1. (Feasible Interval) Tht feasible interval Fg is an interval along e that is identified to be on 
the boundary of X. More formally, Fg = Conv(X(‘^ n e). 

2. (Infeasible Region) The infeasible region Yg = U Yg^ is the union of two disjoint 
intervals Y^ and Yg^ that are identified to be outside of V. By Assumption|^ we initialize 
the infeasible region Yg to {x G e | ||x||oo > 1} for all e. 

3. (Questionable Region) The questionable region Qg = Qg U Ql on e is the union of two 
disjoint questionable intervals along e. Formally, Q^. = e \ (Fg U Yg). The points in Qg 
cannot be certified to be either inside or outside of V by LearnEdge. 

4. (Midpoints in Qg) For each questionable interval Ql, let M* denote the midpoint of Q^. 

We add the superscript {t) to show the dependence of these quantities on days. Furthermore, we 
eliminate the subscript e when taking the union over all elements in E^*'\ e.g. = IJeGSt*) Fi‘\ 
So the information X^*^ can be written as follows: X^*^ = F^*\ F^‘\ Y*^*\ M^*^) . 
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Prediction Rules We now focus on the prediction rules of LearnEdge. On day t, let = {x | 
p(*) . X = 6^*)} be the l^erplane specified by the additional constraint If ^ then 
x(i) = X* by Corollary I h So whenever the a lgor ithm observes x*, it will store x* and predict it in 


the future days when x* £ . This is case P.l So in the remaining cases we know x* ^ . 


The analysis of Lemma[^shows that x^*) must be in the intersection between A^^*^ and the edges E-p, 
so x^*^ = argmax^g^(t)j^^^ c • x. Hence, LeeirnEdge can restrict its prediction to the following 

candidate set: Cand^*^ = {(£;(*) U AT^*)) \ n where = {e G E^*'> \ e C As 

we show in Lemma|^ x^*^ will not be in E^*\ so it is safe to remove E^^'* from Cand^*^. 

Lemma 4. Let e be an edge-space ofV such that e C then x^*) ^ e. 


However, Cand*-*^ can be empty or only contain points in the infeasible regions of the edge-spaces. If 
so, then there is simply not enough information to predict a feasible point in V. Hence, LearnEdge 


predicts an arbitrary point outside of Cand^*^ This is 


case 


P.2 


Otherwise Cand*^*^ contains points from the feasible and questionable regions of the edge-spaces. 
LearnEdge predicts from a subset of Cand^*^ called the extended feasible region Ext^*^ instead of 
directly predicting from Cand^*\ Ext^*^ contains the whole feasible region and only parts of the 
questionable region on all the edge-spaces in \ E^*'>. We will show later that this guarantees 
LearnEdge makes progress in learning the true feasible region on some edge-space upon making a 


mistake. More formally, Ext^*^ is the intersection of with the union of intervals between the 
two mid-points (Mg and (Mg on every edge-space e G E^*') \ and all points in 
Ext(‘) = {X(‘) U {UggB(t)\£wConv ((MgO)(‘), (Mi)W)}} 


In 


P.3 


if Ext*-*^ f 0 then LeeirnEdge predicts the point with the highest objective value in Ext^‘\ 


Finally, if Ext^*^ = 0, then we know only intersects within the questionable regions of the 
learned edge-spaces. In this case, LegirnEdge predicts the intersection point with the lowest objective 
value, which corresponds to |P.4| Although it might seem counter-intuitive to predict the point with the 
lowest objective value, this guarantees that LearnEdge makes progress in learning the true feasible 
region on some edge-space upon making a mistake. The prediction rules are summarized as follows: 


P.l First, if X* is observed and x* G then predict x^*) ^ x*; 

P.2 Else if Cand = 0 or Cand*^*^ C IJgg^(t) then predict any point outside Cand*^*^; 
P.3 Else if Ext^*^ f 0, then predict x*^‘) = argmax^gg^^(t) c • x; 

P.4 Else, predict x^*) = argmin^g(;;jjjjj(t) c • x. 


Update Rules Next we describe how LearnEdge updates its information. Upon making a mistake, 
LearnEdge adds x*^‘) to the set of previously observed solutions i.e. £- U {x*^*^}. 

Then it performs one of the following four mutually exclusive update rules ( U.l|U.4|i in order. 


U.l If x^*) ^ then LearnEdge records x*^*^ as the unconstrained optimal solution x*. 

U.2 Then if x^*^ is not on any learned edge-space in E^*'\ LearnEdge will try to learn a new 
edge-space by checking the collinearity of x*^‘) and any couple of points in X^*'\ So after 
this update LearnEdge might recover a new edge-space of the polytope. 


If the previous updates were not invoked, then x^*) was on some learned edge-space e. LearnE^e 
then compares the objective values of x^*^ and x*^*^ (we know c • x^*) f c ■ by AssumptionlU: 

U.3 Ifc-x(*) > c-xW, then x^*) must be infeasible and LearnEdge then updates the question¬ 
able and infeasible regions for e. 

U.4 If c • x^*) < c • x*^‘) then x^*) was outside of the extended feasible region of e. LearnEdge 
then updates the questionable region and feasible interval on e. 
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In both of U.3 and U.4 LearnEdge will shrink some questionable interval substantially till the 
interval has length less than 2~^ in which case Assumptionj^implies that the interval contains no 
points. So LearnEdge can update the adjacent feasible region and infeasible interval accordingly. 


3.2 Analysis of LearnEdge 


Whenever LecurnEdge makes a mistake, one of the update rules U.l - U.4 is invoked. So the number 


of mistakes of LecurnEdge is bounded by the number of times each update rule is invoked. The 
mistake bound of LearnEdge in Theorem[^is hence the sum of mistakes bounds in Lemmas [5]|^ 

Lemma 5. Update ^.l\ is invoked at most 1 time. 

Lemma 6. Update ^.2\ is invoked at most 3|i?p | times. 

Lemma 7. Updates U.3 and U.4 are invoked at most Odi^-p | W log(d)) times. 


3.3 Necessity of the Precision Bound 

We show the necessity of Assumption [^by showing that the dependence on the precision parameter 
N in our mistake bound is tight. We show that subject to Assumption |3 there exist a polytope and a 
sequence of additional constraints such that any learning algorithm will make Vl{N) mistakes. This 
implies that without any upper bound on precision, it is impossible to learn with finite mistakes. 

Theorem 8. For any learning algorithm A in the Known Objective Problem and any d > 3, there 
exists a polytope V and a sequence of additional constraints }* such that the number of mistakes 

made by A is at least ^l{N). 


3.4 Stochastic Setting 

Given the lower bound in Theorem]^ we ask “In what settings we can still learn without an upper 
bound on the precision to which constraints are specified?” The lower bound implies we must 
abandon the adversarial setting so we consider a RAC style variant. In this variant, the additional 
constraint at each day t is drawn i.i.d. from some fixed but unknown distribution T) over x K such 
that each point (p, b) drawn from V corresponds to the halfspace A/” = {x | p • x < 6}. We make no 
assumption on the form of V and require our bounds to hold In the worst case over all choices of V. 

We describe LearnHull an algorithm based on the following high level idea: LearnHull keeps track 
of the convex hull of all the solutions observed up to day t. LearnHull then behaves as if this 

convex hull is the entire feasible region. So at day t, given the constraint = {x | p^*^ • x < }, 

LearnHull predicts x*^*^ where 

= argmax,,gc(*-i)nAA(‘) ^ • x. (3) 

LearnHull’s hypothetical feasible region is therefore always a subset of the true feasible region - 
i.e. it can never make a mistake because its prediction was infeasible, but only because its prediction 
was sub-optimal. Hence, whenever LearnHull makes a mistake, it must have observed a point that 
expands the convex hull. Hence, whenever It fails to predict x^*), LearnHull will enlarge Its feasible 
region by adding the point x^*) to the convex hull: 

C(*)^Conv(C(‘-i)u{xW}), (4) 

otherwise it will simply set •(— for the next day. LearnHull is described formally in 

Algorithm]^ 

We show that the expected number of mistakes of LeeirnHull over T days is linear in the number of 
edges of V and only logarithmic in T. 

^The dependency on \E'p \ can be improved by replacing it with the set of edges of V on which an optimal 
solution is observed. This applies to all the dependencies on \ E-p \ in our bounds. 

^ We point out that the condition d > 3 is necessary in the statement of Theoremj^since there exists learning 
algorithms for d = 1 and d = 2 with finite mistake bounds independent of N. See Appendix [C| 

"'LearnHull can be implemented efficiently in time polyjT, N, d) if all of the coefficients m the unknown 
constraints in V are represented in N bits. Note that given the observed solutions so far and a new point, a 
separation oracle can be implemented in time polyjT, N, d) using a LP solver. 
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Algorithm 2 Stochastic Procedure (LearnHull) 


procedure LearnHull(I?) 


^ 0. 

> Initialize 

Observe ~ D and set x*^‘) as in (|^. 

> Predict 

Observers x^*) = axgTnayi^^p^j^f^t) c ■ x and update as in (j^. 

end procedure 

> Update 


Theorem 9. For any T > 0 and any constraint distribution T), the expected number of mistakes of 
LearnHull after T days is bounded by O {\E'p \ log(T)). 

To prove Theorem first in Lemma [T0| we bound the probability that the solution observed at day t 
falls outside of the convex hull of the previously observed solutions. This is the only event that can 
cause LearnHull to make a mistake. In LemmafTO} we abstract away the fact that the point observed 
at each day is the solution to some optimization problem. 

Lemma 10. Let V be a polytope and D a distribution over points on E-p. Let X = {xi ,..., Xt- 1 \ be 
t—1 i.i.d. draws from E and Xt an additional independent draw from E. Then Vv[xt ^ Conv(X)] < 
2\Ep\/t where the probability is taken over the draws of points Xi,..., Xtfrom E. 


Finally in Theorem 11 we convert the bound on the expected number of mistakes of LearnHull in 
Theorem|^to a high probability bound. 


Theorem 11. There exists a deterministic procedure such that after T = O {\Ep\ log (1/(5)) days, 
the probability (over the randomness of the additional constraint) that the procedure makes a mistake 
on day T 1 is at most 5 for any 6 € (0,1/2). 


4 The Known Constraints Problem 

We now consider the Known Constraints Problem in which the learner observes the changing 
constraint polytope at each day, but does not know the changing objective function which we 
assume to be written as where are fixed but unknown. Given and 

the subset C [n], the learner must make a prediction on each day. Inspired by Bhaskar et 
al. 0, we use the Ellipsoid algorithm to learn the coefficients {v*}ig[„], and show that the mistake 
bound of the resulting algorithm is bounded by the (polynomial) running time of the Ellipsoid. We 
use V G to denote the matrix whose columns are v* and make the following assumption on V. 

Assumption 6. Each entry in V can be written with N bits of precision. Also w.l.o.g. ||E||f < 1- 

Similar to Section|3]we assume the coordinates of ’s vertices can be written with finite precision]^ 
Assumption 7. The coordinates of each vertex ofV^*^ can be written with N bits of precision. 

We first observe that the coefficients of the objective function represent a point that is guaranteed to 
lie in a region E (described below) which may be written as the intersection of possibly infinitely 
many halfspaces. Given a subset S C [n] and a polytope V, let denote the optimal solution to 
the instance defined by S and V. Informally, the halfspaces defining E ensure that for any problem 
instance defined by arbitrary choices of S and V, the objective value of the optimal solution 
must be at least as high as the objective value of any feasible point in V. Since the convergence rate 
of the Ellipsoid algorithm depends on the precision to which constraints are specified, we do not in 
fact consider a hyperplane for every feasible solution but only for those solutions that are vertices of 
the feasible polytope V. This is not a relaxation, since LPs always have vertex-optimal solutions. 

We denote the set of all vertices of poly tope V by vert (7^), and the set of poly topes V satisfying 
Assumptionj^by $. We then define E as follows: 

E= J IE= (w\...,w") e I V^C [n]yr G • (x'^’^ - x) > 0,VxG vert(P) 

I ies 

(5) 

^LearnEdge fails to give any non-trivial mistake bound in the adversarial setting. 

®We again point out that this is implied if the halfspaces defining the polytope are described with finite 
precision CD- 
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The idea behind our LearnEllipsoid algorithm is that we will run a copy of the Ellipsoid algorithm 
with variables w € as if we were solving the feasibility LP defined by the constraints defining 

T. We will always predict according to the centroid of the ellipsoid maintained by the Ellipsoid 
algorithm (i.e. its candidate solution). Whenever a mistake occurs, we are able to find one of the 
constraints that define T such that our prediction violates the constraint - exactly what is needed to 
take a step in solving the feasibility LP. Since we know T is non-empty (at least the true objective 
function V lies within it) we know that the LP we are solving is feasible. Given the polynomial 
convergence time of the Ellipsoid algorithm, this gives a polynomial mistake bound for our algorithm. 

The Ellipsoid algorithm will generate a sequence of ellipsoids with decreasing volume such that each 
one contains feasible region JF. Given the ellipsoid at day t, LearnEllipsoid uses the centroid 

of as its hypothesis for the objective function = ((w w”)(*)). Given the subset 

and polytope LearnEllipsoid predicts 

G argmaxj ^ • x}. (6) 

When a mistake occurs, LearnEllipsoid finds the hyperplane 


-hW = VL = (w\ ..., w") G f 

[ iesM j 

that separates the centroid of the current ellipsoid (the current candidate objective) from 

After the update, we use the Ellipsoid algorithm to compute the minimum-volume ellipsoid f 
that contains H On day f -h 1, LearnEllipsoid sets PL(*+i) to be the centroid of 
The above procedure is formalized in Algorithm]^ 


Algorithm 3 Learning with Known Constraints (LearnEllipsoid) 

procedure LearnEllipsdid(A) > Against adversary A 

X(i) ^ (£(1) = {z g : ||z||p < = CENTR01D(£:(i))) > Initialize 

for f = 1... do 

Given set x^*^ as in (|^. > Predict 

if x^*) ^ x(‘) then 

set as in 0. \> Update 

£(t+i) ^ ELL1PSQ1D('H(‘\ £(*)), = CENTR01D(£'(‘+l)). 

l(*+i) = (^£(*+1), . 

else 

X(*+i) ^ Z(‘). 

end procedure 


We left the procedure used to solve the LP in the prediction rule of LearnEllipsoid unspecified. 
To simplify our analysis, we use a specific LP solver to obtain a prediction x*^*^ which is a vertex of 
-pC*) \\te defer all the proofs of this section to Appendix [pj 

Theorem 12 (Theorem 6.4.12 and Remark 6.5.2 El). There exists a LP solver that runs in time 
polynomial in the length of its input and returns an exact solution that is a vertex ofV^*\ 


In Theorem 13 we show that the number of mistakes made by LearnEllipsoid is at most the 
number of updates that the Ellipsoid algorithm makes before it finds a point in IF and the number of 
updates of the Ellipsoid algorithm can be bounded by well-known results from the literature on LP. 


Theorem 13. The total number of mistakes and the running time of LearnEllipsoid in the Known 
Constraints Problem is at most poly(n, d, N). 
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A Polynomial Mistake Bound with Exponential Running Time 


In this section we give a simple randomized algorithm for the unknown constraints problem, that in 
expectation makes a number of mistakes that is only linear in the dimension d, the number of rows 
in the unknown constraint matrix A (denoted by to), and the bit precision N, but which requires 
exponential running time. When the number of rows is large, this can represent an exponential 
improvement over the mistake bound of LearnEdge, which is linear in the number of edges on the 
polytope V defined by A. This algorithm which we describe shortly is a randomized variant of the 
well known halving algorithm ifTSl . We leave it as an open problem whether the mistake bound 
achieved by this algorithm can also be achieved by a computationally efficient algorithm. 

Let K. be the hypothesis class of all polytopes formed by to constraints in d dimensions, such that 
each entry of each constraint can be written as a multiple of 1 /2^ (and without loss of generality, up 
to scaling, has absolute value at most 1). We then have 

|/C| = 

We write to denote the polytopes that are consistent with the examples and solutions we have 
seen up to and including day t. Note that > 1 for every t because there is some polytope 

(specifically the true unknown polytope V) that is consistent with all the optimal solutions. On 
each day t we keep track of consistent polytopes and more specifically update the set of consistent 
polytopes by 

^(t+i) = J 7 ? g ic(t) I x(‘) e argmax c • x L (8) 

{ xG'PnNd) J 

where is the new constraint on day t. The formal description of the algorithm, FCP, is presented 
in Algorithm^ To predict at each day, FCP selects a polytope from uniformly at random 
and guesses that solves the following LP; c • x. 


Algorithm 4 Find Consistent Polytope FCP 

procedure FCP 

=/C. 

[> Initialize 

for f = 1.. do 

Choose uniformly at random. 


Guess x(‘) G argmax^gp(t)|^^(t) c • x. 

Observe x^*) and set as in (|^. 

\> Predict 

end procedure 



We now bound the expected number of mistakes that FCP makes. 

Theorem 14. The expected number of mistakes that FCP makes is at most log(|Al!|) = 0{dmN), 
where the expectation is over the randomness of FCP and possible randomness of the adversary. 

Proof First note that the probability that FCP does not make a mistake at day t can be expressed as 
|. This is because if FCP makes a mistake at day t, it must have selected a polytope that 
will be eliminated at the next day (also note that FCP selects its polytope from among the consistent 
set uniformly at random). Now consider the product of these probabilities over all days t = 1.. .T. 

n(l - P [Mistake at day t]) = H 

Finally, note that the expected number of mistakes is the sum of probabilities of making mistakes 
over all days. Using the inequality (1 — a;) < for every x G [0,1] and rearranging terms we get 

^P [Mistake at day t] < log ^ | ^ - 0{dmN), 

since 1^(1) I ^ |/C(^+1) I > 1, □ 
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Finally, we remark that the randomized halving techniqne above will also result in a polynomial 
mistake bound in the more demanding variant where not only the underlying constraint matrix but 
also the linear objective function is unknown. This is because the coefficients of the objective function 
can be written in dN bits if they are also represented with hnite precision. However, the issue about 
the exponential running time still exists in the new setting. 

B Missing Proofs from Section]^ 

B.l Section Ism 

Proof of Lemma 1^ Let x* be the optimal solution of the linear program solved over the unknown 
polytope V, without the added constraint i.e. x* = argmax^^g-p c • x. 

1. Suppose that x* G then clearly = x*. By Assumption[^ x* lies on a vertex of V 
and therefore x*^‘^ lies on one of the edges of V. 

2. Suppose that x* ^ i.e. • x* > Then we claim that the optimal solution 
x^*'> satisfies • x^*^ = Suppose to the contrary that p^*) • x^*) < b^*\ Since 
c • X* > c • x^*), then for any point y G Conv(x(‘\ x*), 

c • y = c • (ax^*^ + (1 ~ a)x*) = a(c ■ x^*^) + (1 — a){c ■ x*) > c • x^*^ Va G [0,1]. 

Since x^*) strictly satishes the new constraint, there exists some point y* G Conv(x*^‘\ x*) 
where y* ^ x^*) such that y* G (i.e. y* is also feasible). It follows that c • y* > 
c • x(*\ which contradicts Assumption[^ Therefore, x^*) must bind the additional constraint. 
Furthermore, by non-degeneracy Assumption]^ x^*) binds exactly {d — 1) constraints in V, 
i.e. x*^*^ lies at the intersection of c? — 1 hyperplanes of V which are linearly independent by 
Assumption]^ Therefore, x^*) must be on an edge of V. 


□ 

Proof of Lemma]^ Without loss of generality, let us assume y can be written as convex combination 
of X and z i.e. y = ax + (1 — a)z for some a G (0,1). Let By = {j \ Ajy = hj} be the set of 
binding constraints for y. We know that \By \ > d — 1 by Assumptionjm For any j in ]3y, we consider 
the following two cases. 


1. At least one of x and z belongs to the hyperplane {w | AjW = b^}. Then we claim that all 
three points bind the same constraint. Assume that AjX = hj, then we must have 


AyZ = 


Aj{y-ax) 
{I-a) 


b, - ah, ^ 


(1-a) 


Similarly, if we assume Ajz = bj, we will also have Ajx = b_, . 

2. None of x and z belongs to the hyperplane {w | AjW = bj} i.e. AjX < hj and AjZ < bj 
both hold. Then we can write 


hj = Ajy = aAjX + (1 — a)AjZ < ahj -f (1 — a)hj = hj, 
which is a contradiction. 


It follows that for any j G By, we have AjX = Ajy = AjZ = bj. Since \By\ > d — 1, we know 
by Assumption that the set of points that bind any set of d — 1 constraints in By will form an 
edge-space and further this edge-space will include x, y, and z. □ 

Proof of Lemma ]^ First, note that the observed solution x*^*^ is a vertex in the polytope = 
V n that is an intersection of exactly d constraints by Assumption]^ and Assumption]^ Second, 
note that all points in e bind at least d — 1 constraints in V and since e C then all points in e 
bind at least d constraints in . It follows that any vertex of on e must bind at least (d -I- 1) 
constraints, which rules out the possibility of x*^*^ being on e. □ 
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B.2 Section I3l2l 


Proof of Lemma 1^ As soon as LearnEdge invokes npdate rule |U.l[ it records the solution x* = 
argmaxxGP c • x. Then, the prediction rule specihed by |P.l prevents further updates of this type. This 
is because x* continues to remain optimal if it feasible in the more constrained problem (optimizing 
over the polytope □ 

Proof of Lemma|^ Rule U.2 is invoked only when x^*) ^ X^*'> and x^*) ^ e for any of e S . So 
after each invokation, a new point on the edge of V is observed. Whenever 3 points are observed on 
the same edge of V, the edge-space is learn ed by Lemma|^(since the points are necessarily collinear). 
Hence, the total number of times rule|U.2|can be invoked is at most SlE-p], □ 


We now introduce Lemmas [T6| and [TT] that will be used in the proof of Lemma[T^which itself will 
be useful in the proof of Lemma|7] But first, for completeness, in Le mma [15 we show that we are 
guaranteed the existence of an edge-space if the update implemented is |U.3| or U.4| 

Lemma 15. 


(1) If update rule U.3 is used, then there exists edge-space e £ E^^'l such that x^*^ £ e. 


(2) If update rule U.4 is used, then there exists edge-space e £ E^^'> such that £ e. 


Proof We prove this by contradiction. First consider the case in which c • x*^*^ > c • x^*) and suppose 
x^*) G {x G I Ve G E^*\x ^ e}. When this is the case we know that x*^*^ is feasible at day t 
and this contradicts x*^*^ being optimal at that day because c • x^*^ > c • x*^*). 

Next consider the case in which c • x^*) < c • x*^*) and suppose x^*) G {x G | Ve G E^*''> , x ^ e}. 

I to mak e a prediction because n Ext^*^ is non-empty and includes at least 
we have x^*) = argmax^(t)^g^^(t) c • x. Since x^*^ G nExt^‘\ 
c • x*^*\ which is again a contradiction. □ 


P.3 


We would have used 
the point x^*^. Note that by 
we must also have c • x*^*^ > 


P.3 


Lemma 16. If U.3 is implemented at day t, then x^*) f V and x^*) G H for some 


i = 0orl where e is given in 


Proof Each time the algorithm makes u pda te U.3 we know that the algorithm’s pre di ction x 


pt) 


on some edge-space e G b y Le m majlSj Therefore, LearnEdge did not use jP.l 
So we only need to check P.3 and|R4f 


P.2 


to predict 


• If P.3 was used, we know that x^*) G but x^*) must violate a constraint of V, due to 
x'-*-* being the observed solution and having lower objective value. This implies that x*^*^ is 
in some questionable region, say for i = 0 or 1 but also in the extended feasible on 

e, i.e. x(‘) G Ext^*) n 


• If 


P.4 


was 


used, then Ext*-*^ = 0. However LearnEdge selected x^*) from Cand^*^ f 


with the lowest objective value. Einally, when upd ating with U.3 \(i) x^*) G Cand^*^ and (ii) 


c • x^*) < c • x*^‘\ So we could not have used P.4 to predict 


□ 


Lemma 17. If U.4 is implemented at day t, then G for some z = 0 or 1 where e 

is given in EE 


Proof. As in Lemma 16 LecurnEdge did not use jP.lj or P.2 to predict x^*^ (again by application of 
Lemma [T5 ]i. So we only need to check|R3]and|P.4| 


• If 


P.3 


was used, then LearnEdge did not guess x^*) which had the higher objective because 
it was outside of Ext*-*^ along edge-space e. Since x*^*) is feasible, it must have been on some 
questionable region on e, say for some i = 0 or 1. Hence, x*-*^ G ((5g)*'*^\Ext^*^. 
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If R4 was used, then = 0 and thus was a candidate solution but outside of the 
extended feasible interval along edge-space e. Further, because GVv/e. know that x*^*) 
must be in some questionable interval along e, say (Q*)*-*^ for some i = 0 or 1. Therefore, 

x(‘) e (Q*)(*)\Ext(*). 


Lemma 18. Each time U.3 or U.4 is used, there is a questionable interval on some 


□ 

■space whose 


length is decreased by at least a factor of two. 


Proof From Lemma 16 


we know that if 


' U.3 is used then x*^*) S e, is infeasible but outside of the 
known infeasible interval and inside of the extended feasible interval along e. Note that if 

a point X is infeasible along edge space e in the questionable interval then the constraint 

it violates is also violated by all points in Y^. Hence the interval Conv(x, (17)^*^) contains only 
infeasible points. By the dehnition of and the fact that x^*) is in the extended feasible region 

on e, we know that 


= |(Q:)(*)\Conv(x(‘),(k;®)(*))| < |(Q:)(‘)\Conv((M®)(*),(y;) 


\(t) 


2 

Further, from convexity we know that if x^*^ is feasible on edge-space e at day t, then the interval 
Conv(x(‘\ only contains feasible points on e. We know that x^*^ is feasible and in a question¬ 
able interval along edge space e but outside its extended feasible region, by Lemmal^ Thus, 

by dehnition of the midpoint (M®)^*) we have 


|(Q^)(‘+i)| = |(Q®)(*)\Conv(x(‘),(T®e)^*^)| < |(Q®e)(*)\Conv((M:)(‘),T®i*)) 




□ 


Proof of Lemma]^ Let Q® be the updated questionable interval. We know initially Q® has le ngth at 


we showed that each time an update 


U.3 


or 


U.4 


IS 


most than 2s/d by Assumption]^ In Lemma 18 
invoked, the length of Q® is decreases by at least a half Then after at most 0{N log(d)) updates, the 
interval will have length less than 2~^ after which the interval will be updated at most once because 
there is at most one point up to precision N in it. 

Therefore, the total number of updates on Q® is bounded by 0( N lo g(d)). _ Since there are 

at most 2\E-p\ questionable intervals, the total number of updates U.3 and |U.4| is bounded by 

0{\Ev\N\og{d)). □ 


B.3 Section 1331 

We prove the lower bound in Theorem[^initially for d = 3. 

Theorem 19. If Assumptions^and^hold, then the number of mistakes of any learning algorithm in 
the known objective problem is at least Id,{N) for d = 3. 


Proof The high level idea of the proof is as follows. In each day the adversary can pick two points 
on the two bold edges in Figurej^as the optimal points and no matter what the learner predicts, the 
adversary can return a point that is different than the guess of the learner as the optimal point. If the 
adversary picks the midpoint of the questionable region in each day, then the size of the questionable 
region in both of the lines will shrink in half. So this process can be repeated N times where each 
entry of every vertex can be written with as a multiple of 1/2^, by Assumption]^ Finally, we show 
that at the end of this process, the adversary can return a simple polytope which is consistent with all 
the observed optimal points so far. 

We formalize this high level in procedure ADVERSARY that takes as input any learning algorithm L 
and interacts with £ for N days. Each day the adversary presents a constraint. Then no matter what 
£ predicts, the adversary ensures that £’s prediction is incorrect. After N interactions, the adversary 
outputs a feasible polytope that is consistent with all of the previous actions of the adversary. 
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Figure 2: The underlying polytope 
in the proof of Theorem The 
two learned edges are in bold. 



In procedure ADVERSARY, subroutines NAG and AD-2 are used to pick a constraint and return an 
optimal point that causes £ to make a mistake, respectively. We use the notation mid(ii) in subroutines 
NAG and AD-2 to denote the middle point of a real interval R, top(i?) to be the largest point in R, and 
bot(i?) to be the smallest value in R. Finally, we assume the known objective function is c = (0,0,1). 


Algorithm 5 Adversary Updates (ADVERSARY) 


Input: Any learning algorithm L and bit precision N 
Output: Polytope V that is consistent with C making a mistake each day. 
procedure ADVERSARY(£, N) 

Set i?® = [0,1], = [1,2]. 0 Initialize 

for f = 1, • • • , iV do 


((pW,gW) 

Show constraint • x < to £. 
Get prediction from £. 


(yS^\Rf,Rf"^ ^ AD-2(4' 




> Constraint 
> Update 


Reveal the optimal x*^*) ^ x^*^ and update the regions and r!'^'^ . 

A, b •(— MATRIX(i?]^, R 2 ) i> Constraint matrix consistent with {x^*^ | t S [N]} 

return A, b 
end procedure 


The procedure NAG takes as input two real valued intervals and then outputs two points ri and ^2 as 
well as the new constraint denoted by the pair (p, q). The two points will be used as input in AD-2 
along with the learner’s prediction. In procedure AD-2 the adversary makes sure that the learner 
suffers a mistake. On each day, one of the points say produced by NAG has a higher objective 
than the other one. If the learner chooses then the adversary will simply choose a polytope that 
makes r 2 infeasible so that ri is actually the optimal point that day. If the learner chooses ri then the 
adversary picks r 2 as the optimal solution. Note that the three points ri, r 2 , and computed in NAG 
all bind the constraint and are not collinear, and thus uniquely define the hyperplane {x : p • x = g}. 
Finally, in AD-2 the adversary updates the new feasible region for her use in the next days. 

ADVERSARY finishes by actually outputting the polytope that was consistent with the constraints and 
the optimal solutions he showed at each day. This polytope is defined by constraint matrix A and 
vector b using the subroutine MATRIX as well as the nonnegativity constraint x > 0. 

To prove that the procedure given in ADVERSARY does in fact make every learner £ make a mistake at 
every day, we need to show that (i) there exists a simple unknown poly tope that is consistent with 
what the adversary has presented in the previous days. Furthermore, we need to show that (ii) the 
optimal point returned by the adversary on each day is indeed the optimal point corresponding to the 
UP with objective c and unknown constraints subject to the additional constraint added on each day. 
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Algorithm 6 New Adversarial Constraint (NAC) 
procedure NAC(i?i, i? 2 ) 

Set e ^ 0.01. 

Setn ^ (0, l,mid(i?i)) 

r 2 ^ (1, mid(i? 2 ), 1 + e • mid(i? 2 )) 
ra ^ (l,mid(i? 2 ), 0 ). 

Set p = (1 — mid(i? 2 ), 1, 0) and q = 1 > The constraint is p • x < 1 and binds at ri, r 2 , ra 

return (p, q) and ri, r 2 . 

end procedure 


Algorithm 7 Adaptive Adversary (AD- 2 ) 
procedure AD- 2 (i?i, i?2, ri, r2, x) 

if X == r2 then > ri and r2 as in Algorithmic 

X ^ ri. 

i?2 ^ [bot(i?2),mid(i?2)] ■ 

else 

X = r2. 

i?2 ^ [mid(i?2),top(i?2)] ■ 

Ri ^ [mid(i?i),top(i?i)]. 
return x, i?i, i?2 

end procedure 


To show (i) note that point = (0,1, mid(i?^* is always a feasible point for t S [A^] in the 
polytope given by A and b and the new constraint added each day will allow to remain feasible. 

To show (ii) first note that the new constraint added is always a binding constraint. So by Assumption|C 
it is sufficient to check the intersection of the edges of the polytope output by MATRIX and the 
newly added hyperplane and return the (feasible) point with the highest objective as the optimal 
point. Second, the following equations define the edges of the polytope which are one dimensional 
subspaces e®’-^ according to Assumptionj^with A and b being the output of MATRIX. 

e*’-? = {x e I A,x = 6, and AjX = bj} j S {1, 2, 3,4}, ij^j, 

where Ai is the ith row of A. Since the first two constraints define two parallel hyperplanes, we only 
need to consider 5 edges. Let 

rf) = (^0, l,mid(i?f"^^)) , 

and 

^2^ = (l,mid(i?^*"^^), 1 + e • mid(i?^‘"^^)) . 

We show that the new constraint either intersects the edges of the polytope at or or do not 
intersect with them at all. This will prove that the optimal points shown by the adversary each day is 
consistent with the unknown polytope. 

1. = (0, 0, /i) • s + (0,1, 0) that intersects with the new hyperplane at 


Algorithm 8 Matrix consistent with adversary (MATRIX) 
procedure MATRIX(i?i, R 2 ) 

Set/i = (top(i?i) +bot(i?i)) /2 and /2 = (top(i? 2 ) +bot(i? 2 )) /2 and e > 0 




return A and b 
end procedure 


-1 

1 

(/i - 1 - e) 

-(/2 - 1 ) • /i 


0 0 
0 0 
-e 1 

h 0 


and b 
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2 . = (0, / 2 , e • / 2 ) • s + (1,0,1) that intersects with the new hypeiplane at r^'^. 

3. = (0, 0,1 + e • / 2 ) • s + (1, / 2 , 0) that does not intersect the new hyperplane unless 
mid(i? 2 ) = /2 (which does not happen). 

4. = (1, /2 — 1,1 + e • /2 — /i) • s + (0,1, /i) that does not intersect the new hyperplane 
unless mid(i? 2 ) = /2 (which does not happen). 

5. = (0, —1, /i) • s + (0,1, /i) never intersects the hyperplane. 


And this concludes the proof. 


□ 


We now prove Theorem]^ even for d > 3. 

Proof of Theorem 1^ We modify the proof of Theorem 19 to d > 3 by adding dummy variables. 
These dummy variables are denoted by X 4 . 1 i- Furthermore, we add dummy constraints Xi > 0 
for all the dummy variables. We modify the objective function in the proof of Theorem |19| to be 
c = (0, 0,1, —1,..., —1). This will cause all the newly added variables to have no effect on the 
optimization (they should be set to 0 in the optimal solution) and, hence, the result from Theorem [T^ 
extends to the case when d > 3. □ 


B.4 Section I3l4l 


Proof of Lemma[^ First, since all of the points xi,... ,xt are drawn i.i.d. from T), we observe by 
symmetry that the event we are interested in is distributed identically to the following event: draw 
a set of t points X' = {si,..., Xt} i.i.d. from V and select an index i G {1,...,f} uniformly at 
random and compute the probability that Xi ^ Conv(2f' \ {xi}). In other words 

Pr [xt ^ Conv(X)] = Pr [x^ ^ Conv(A:' \ {xj)]. (9) 

X-I £Ci 1,... ,t} 

We analyze the quantity on the right hand side of ([^ instead, fixing the choices of xi,... ,Xt, 
and analyzing the probability only over the randomness of the choice of index i. For each edge 
e G E-p, let X' = X' n e. Since each edge lies on a one dimensional subspace, there are at most 
two extreme points xf, x| G X^ that lie outside of the convex hull of other points i.e. such that 
xf ^ Conv(Ai' \ {xf}) and X 2 ^ Conv(2f' \ {x^}). We note that when we choose an index i 
uniformly at random, the probability that we select a point x G X^ is exactly \X'J\/t, and conditioned 
on selecting a point x G X', the probability that x is an extreme point (i.e. x G {xf, xf}) is at most 
2/|X{|. Hence, we can calculate 


Pr [xi ^ Conv(X' \ (xj)] = ^ Pr[xi G Xf] • Pr[xi ^ Conv(Xf \ (xj) | Xi G Xf] 

e^E-p 



eG Ep 


2 


E 

e^Ep 


2 

t 


2|Xpl 

t 


□ 

Proof of Theorem First, we show that LearnHull makes a mistake only if the true optimal 
point lies outside of the convex hull formed by the previous observed optimal points 

, • ■ •, }. Suppose that at day t, the algorithm predicts the point x^*) instead of the optimal 

point Since each point in is feasible and x^*) is the point with the highest objective 

value among the points in {x G j p' • x < b'}, then it must be that x^*) ^ because 

otherwise c • x^*^ > c • x^*^. By Lemma 10 we also know that the probability that x^*^ lies 
outside of is no more than 2\Ep\/t in expectation, which also upper bounds the probability 

of LearnHull making a mistake at day t. Therefore, the expected number of mistakes made by 
LearnHull over T days is bounded by the sum of probabilities of making a mistake in each day 
which is 2|Xpl/t = 0{\Ep\loglT)). □ 

Proof of Theorem [II| The deterministic procedure mns [ 18 log(l/<5)] independent instances of the 
LearnHull each using independently drawn examples. The independent instances are aggregated 
into a single prediction rule by predicting using the modal prediction (if one exists), and otherwise 
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predicting arbitrarily. Hence, the aggregate prediction is correct whenever at least half of the instances 
of LearnHull are correct. 


We show that if each instance of the LearnHull is run for 8 \E'p \ days, then the probability that more 
than half of the instances of LearnHull make a mistake on a newly drawn constraint at day T + 1 is 
at most 5. The result is that with probability at least 1 — b the majority of instances of LearnHull 
predict the correct optimal point, and hence the aggregate prediction is also correct. 


Let Zi be the random variable that denotes the probability that the ith instance of the LearnHull 
algorithm makes a mistake on a fresh example, after it has been trained for \ days. By Theorem|^ 

we know E[Zj\ < 1/4 for all i. Now by Markov’s inequality. 


Pr 



Zi > 3 ■ E[Zi] 


< 1/3, 


for all i. Hence, the expected number of instances that make a mistake is at most 1/3. Finally, since 
each instance is trained on independent examples, a Chernoff bound implies that the probability that 
at least half of the instances of LearnHull make a mistake is bounded by 5. 


□ 


C Circumventing the Lower Bound when d <2 


In Theorem]^ (in Section 3.3 i, we proved the necessity of Assumption by showing that the 
dependence on the precision parameter N in our mistake bound is tight. However, Theorem 
requires the dimension d to be at least 3. 


We now show that this condition on the dimension is indeed necessary—even without the finite 
precision assumption (Assumption]^, we can have (computationally efficient) algorithms with small 
mistake bounds when the dimension d < 2. 


In d = 1, at most two constraints are sufficient to determine any constraint matrix A because the 
constraint matrix A defines a feasible interval on the real line. So we will guess the value that 
maximizes the objective subject to the single known constraint. Once we have made a mistake, we 
must have learned the tme optimal to the underlying problem because our guess was infeasible. After 
this single mistake, we either guess the true optimal that we have already seen or if it is not feasible 
with the new constraint then we guess the point that maximizes the objective subject to the new 
constraint. Thus after one mistake, the learner will not make any more mistakes. 

Lemma[^tells us that the line between any three collinear points must give us an edge-space of the 
underlying polytope. When d = 2, the corresponding edge-space is then just one of the original 
constraints of the underlying polytope. Since each solution must be on an edge of the underlying 
polytope each day, we can make at most 3m mistakes without seeing the true objective. Hence, all 
together, we can make at most 3m -|-1 mistakes before we recover all the constraints of the underlying 
polytope, or all the rows of the constraint matrix A, and see the true optimal solution. 

This phenomenon does not continue to hold for d > 2 (as we show in our lower bound in Theorem]^. 


D Missing Proofs from Section]^ 


First we state Theorem|^from Grotschel et al. ifTTl about the mnning time of the Ellipsoid algorithm. 

Theorem 20. Let V C be a polytope given as the intersection of linear constraints, each specified 
with N bits of precision. Given access to a separation oracle which can return, for each candidate 
solution p a hyperplane with N bits of precision that separates p from V, the Ellipsoid algorithm 
outputs a point p' £V or outputs V is empty at most poly(d, N) iterations. 


We are now ready to bound the number of mistakes that LearnEllipsoid makes. 

Proof of Theorem 13 Whenever LearnEllipsoid makes a mistake, there exists a separating 
hyperplane 


•hW = J if = (x 


^)e 


TjTlXd 


(xW _xl‘l) > 0 




(t)) 
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that cause the Ellipsoid algorithm to run for another iteration. When LearnEllipsoid reduces to a 
set that contains a single point up to N bits of precision then predicting via the above equation for H 
will ensure we never make a mistake again. Hence, the number of mistakes that LearnEllipsoid 
can commit against an adversary is bounded by the maximum number of iterations for which the 
Ellipsoid algorithm can be made to run, in the worst case. 

We know that each day is on a vertex of the polytope which is guaranteed to have coordinates 
specified with at most N bits of precision by Assumption]?] Theorem [T2| guarantees that the solution 
that LearnEllipsoid produces using the following equation 




argmax ■ 
xG'Pf*) 




iGSW 


is a vertex solution of So x*^‘) can also be written with N bits of precision by Assumption]^ 
Thus, every constraint in and hence each separating hyperplane can be written with d ■ N bits of 
precision. By Assumption]^and Theorem]^ we know that the Ellipsoid algorithm will find a point 
in the feasible region after at most poly(ri7d, N) many iterations which is the same as the number 
of mistakes of LearnEllipsoid. □ 


19 


